AITopics | neural tts

Collaborating Authors

neural tts

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

Liang, Zheng, Song, Zheshu, Ma, Ziyang, Du, Chenpeng, Yu, Kai, Chen, Xie

arXiv.org Artificial IntelligenceJun-14-2023

Recently, end-to-end (E2E) automatic speech recognition (ASR) models have made great strides and exhibit excellent performance in general speech recognition. However, there remain several challenging scenarios that E2E models are not competent in, such as code-switching and named entity recognition (NER). Data augmentation is a common and effective practice for these two scenarios. However, the current data augmentation methods mainly rely on audio splicing and text-to-speech (TTS) models, which might result in discontinuous, unrealistic, and less diversified speech. To mitigate these potential issues, we propose a novel data augmentation method by applying the text-based speech editing model. The augmented speech from speech editing systems is more coherent and diversified, also more akin to real speech. The experimental results on code-switching and NER tasks show that our proposed method can significantly outperform the audio splicing and neural TTS based data augmentation systems.

artificial intelligence, natural language, recognition, (15 more...)

arXiv.org Artificial Intelligence

2306.08588

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages

Do, Phat, Coler, Matt, Dijkstra, Jelske, Klabbers, Esther

arXiv.org Artificial IntelligenceMay-30-2023

We train a MOS prediction model based on wav2vec 2.0 using the open-access data sets BVCC and SOMOS. Our test with neural TTS data in the low-resource language (LRL) West Frisian shows that pre-training on BVCC before fine-tuning on SOMOS leads to the best accuracy for both fine-tuned and zero-shot prediction. Further fine-tuning experiments show that using more than 30 percent of the total data does not lead to significant improvements. In addition, fine-tuning with data from a single listener shows promising system-level accuracy, supporting the viability of one-participant pilot tests. These findings can all assist the resource-conscious development of TTS for LRLs by progressing towards better zero-shot MOS prediction and informing the design of listening tests, especially in early-stage evaluation.

large language model, machine learning, somo, (22 more...)

arXiv.org Artificial Intelligence

2305.19396

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Netherlands (0.05)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.53)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.41)

Add feedback

EfficientSpeech: An On-Device Text to Speech Model

Atienza, Rowel

arXiv.org Artificial IntelligenceMay-23-2023

State of the art (SOTA) neural text to speech (TTS) models can generate natural-sounding synthetic voices. These models are characterized by large memory footprints and substantial number of operations due to the long-standing focus on speech quality with cloud inference in mind. Neural TTS models are generally not designed to perform standalone speech syntheses on resource-constrained and no Internet access edge devices. In this work, an efficient neural TTS called EfficientSpeech that synthesizes speech on an ARM CPU in real-time is proposed. EfficientSpeech uses a shallow non-autoregressive pyramid-structure transformer forming a U-Network. EfficientSpeech has 266k parameters and consumes 90 MFLOPS only or about 1% of the size and amount of computation in modern compact models such as Mixer-TTS. EfficientSpeech achieves an average mel generation real-time factor of 104.3 on an RPi4. Human evaluation shows only a slight degradation in audio quality as compared to FastSpeech2.

artificial intelligence, efficientspeech, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2305.13905

Country:

North America > Canada > Quebec > Montreal (0.05)
Asia > Philippines (0.04)

Genre: Research Report (0.40)

Industry: Information Technology (0.69)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.62)

Add feedback

Siri gains a new gender-neutral voice option in latest iOS update – TechCrunch

#artificialintelligenceFeb-24-2022, 19:50:11 GMT

Apple has developed a new Siri voice, now available in the beta versions of its iOS 15.4 software, that doesn't sound obviously male or female. The decision to introduce a gender-neutral voice is one that sees the tech giant taking yet another step away from the criticism that, historically, digital assistants have reinforced unfair gender stereotypes. Over the years, industry observers and experts argued how the creation of voice assistants with female-sounding names -- like Alexa, Siri and Cortana -- which also speak with female-sounding voices, implied that women should be the ones to do your bidding at any time and even take your abuse. A U.N. study additionally called out the female voiced-assistants and their submissive and sometimes even flirty and coy styles. More problematically, the decision to make so many of the virtual assistants female by default was likely driven by a lack of diversity in the teams responsible for building our everyday technology.

new gender-neutral voice option, new siri voice, siri voice, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

A Survey on Neural Speech Synthesis

#artificialintelligenceJul-1-2021, 00:18:36 GMT

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry. As the development of deep learning and artificial intelligence, neural network-based TTS has significantly improved the quality of synthesized speech in recent years. In this paper, we conduct a comprehensive survey on neural TTS, aiming to provide a good understanding of current research and future trends. We focus on the key components in neural TTS, including text analysis, acoustic models and vocoders, and several advanced topics, including fast TTS, low-resource TTS, robust TTS, expressive TTS, and adaptive TTS, etc. We further summarize resources related to TTS (e.g., datasets, opensource implementations) and discuss future research directions.

neural speech synthesis, neural tts, tts

#artificialintelligence

Genre: Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback